RECOVERY IN MASSIVELY PARALLEL SYSTEMS 1 Recovery in Massively Parallel

نویسندگان

  • Geert Deconinck
  • J. Vounckx
  • R. Lauwereins
چکیده

The objective of ESPRIT-project 6731 FTMPS [1] is to develop techniques and system software to integrate Fault Tolerance in Massively Parallel Systems [2] . This covers the whole range from error detection, over fault-diagnosis and fault isolation to system and application recovery. Important is the research for applicability in massively parallel systems as well as the development of system software that may be commercialized in future products . The project-partners are : Parsytec Computer GmbH (D), British Aerospace Ltd. (UK), Katholieke Universiteit Leuven (B), Universitat-GH Paderborn (D) (recently replaced by the Medizinische Universitat zu Liibeck), Universitat Erlangen-Niirnberg (D) and Universidade de Coimbra (P). Although the Parsytec systems (the PowerXplorer is one of them) have been the development hardware, the developed methodologies and implementations have been kept as hardware independent as possible .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recovery in Massively Parallel Systems

The objective of ESPRIT-project 6731 FTMPS [1] is to develop techniques and system software to integrate Fault Tolerance in Massively Parallel Systems [2]. This covers the whole range from error detection, over fault-diagnosis and fault isolation to system and application recovery. Important is the research for applicability in massively parallel systems as well as the development of system sof...

متن کامل

Facing up to the Inevitable: Intelligent Error Recovery in Massively Parallel Processing in Memory Architectures

Massively parallel “Processing-In-Memory” (PIM) architectures have been shown to yield increases in performance due to their “memory-centric” nature. However, as PIM is still a developing technology, advanced issues such as error detection and failure recovery have not yet been addressed. We describe the application of concepts found in our multi-agent system, ADE, to PIM, incorporating its mec...

متن کامل

Massively Parallel Execution Model and Massively Parallel Architecture

The purposes for the research and development of the RWC massively parallel computer project are (1) to e ciently support exible and integrated computation which are research targets in RWC Project, and (2) to pursue a general purpose massively parallel system e ciently supporting multiple programming paradigms, and (3) to realize a stand{alone system which has a mature operating system. For th...

متن کامل

A massively parallel strategy for STR marker development, capture, and genotyping

Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. He...

متن کامل

A User-triggered Checkpointing Library for Computationintensive Applications

We propose a method to incorporate coordinated checkpointing and rollback in high performance computing applications on massively parallel computers. A library allows the user to specify which data-items (including files) belong to the contents of the checkpoint, and to trigger the checkpointing in the application. The recovery-line management on the distributed disk system takes care of which ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001